AITopics

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.69)

Neural Information Processing SystemsJun-14-2026, 04:47:20 GMT

Deno-IF: Unsupervised Noisy Visible and Infrared Image Fusion Method

Most image fusion methods are designed for ideal scenarios and struggle to handle noise. Existing noise-aware fusion methods are supervised and heavily rely on constructed paired data, limiting performance and generalization. This paper proposes a novel unsupervised noisy visible and infrared image fusion method, comprising two key modules. First, when only noisy source images are available, a convolutional low-rank optimization module decomposes clean components based on convolutional low-rank priors, guiding subsequent optimization. The unsupervised approach eliminates data dependency and enhances generalization across various and variable noise. Second, a unified network jointly realizes denoising and fusion. It consists of both intra-modal recovery and inter-modal recovery and fusion, also with a convolutional low-rankness loss for regularization. By exploiting the commonalities of denoising and fusion, the joint framework significantly reduces network complexity while expanding functionality. Extensive experiments validate the effectiveness and generalization of the proposed method for image fusion under various and variable noise conditions.

artificial intelligence, name change, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsFeb-15-2026, 19:44:27 GMT

Facilitating Multimodal Classification via Dynamically Learning Modality Gap Y ang

Multimodal learning falls into the trap of the optimization dilemma due to the modality imbalance phenomenon, leading to unsatisfactory performance in real applications.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Communications > Social Media (0.68)
(4 more...)

Neural Information Processing SystemsFeb-12-2026, 00:47:09 GMT

cf7700139af1fa346d2f57f1f5c26c18-Supplemental-Conference.pdf

category, gcn, qart, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Alvarez-Trejos, Juan Ignacio, Balanya, Sergio A., Ramos, Daniel, Lozano-Diez, Alicia

Probabilistic Fusion and Calibration of Neural Speaker Diarization Models

arXiv.org Artificial IntelligenceDec-4-2025

End-to-End Neural Diarization (EEND) systems produce frame-level probabilistic speaker activity estimates, yet since evaluation focuses primarily on Diarization Error Rate (DER), the reliability and calibration of these confidence scores have been largely neglected. When fusing multiple diarization systems, DOVER-Lap remains the only established approach, operating at the segment level with hard decisions. We propose working with continuous probability outputs, which enables more sophisticated fusion and calibration techniques that can leverage model uncertainty and complementary strengths across different architectures. This paper presents the first comprehensive framework for calibrating and fusing EEND models at the probability level. We investigate two output formulations (multilabel and powerset representations) and their impact on calibration and fusion effectiveness. Through extensive experiments on the CallHome two-speaker benchmark, we demonstrate that proper calibration provides substantial improvements even for individual models (up to 19% relative DER reduction), in some cases mitigating the absence of domain adaptation. We reveal that joint calibration in powerset space consistently outperforms independent per-speaker calibration, that fusion substantially improves over individual models, and that the Fuse-then-Calibrate ordering generally outperforms both calibrating before fusion and uncalibrated fusion while requiring calibration of only a single combined model. Our best configuration outperforms DOVER-Lap in terms of DER while providing reliable confidence estimates essential for downstream applications. This work proposes best practices for probability-level fusion of EEND systems and demonstrates the advantages of leveraging soft outputs over hard decisions.

artificial intelligence, calibration, machine learning, (17 more...)

2511.22696

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)

Jawaid, Mohsi, Märtens, Marcus, Chin, Tat-Jun

Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting

arXiv.org Artificial IntelligenceNov-4-2025

Spacecraft pose estimation is crucial for autonomous in-space operations, such as rendezvous, docking and on-orbit servicing. Vision-based pose estimation methods, which typically employ RGB imaging sensors, is a compelling solution for spacecraft pose estimation, but are challenged by harsh lighting conditions, which produce imaging artifacts such as glare, over-exposure, blooming and lens flare. Due to their much higher dynamic range, neuromorphic or event sensors are more resilient to extreme lighting conditions. However, event sensors generally have lower spatial resolution and suffer from reduced signal-to-noise ratio during periods of low relative motion. A beam-splitter prism was employed to achieve precise optical and temporal alignment. Then, a RANSAC-based technique was developed to fuse the information from the RGB and event channels to achieve pose estimation that leveraged the strengths of the two modalities. The pipeline was complemented by dropout uncertainty estimation to detect extreme conditions that affect either channel. To benchmark the performance of the proposed event-RGB fusion method, we collected a comprehensive real dataset of RGB and event data for satellite pose estimation in a laboratory setting under a variety of challenging illumination conditions. Encouraging results on the dataset demonstrate the efficacy of our event-RGB fusion approach and further supports the usage of event sensors for spacecraft pose estimation. To support community research on this topic, our dataset has been released publicly. Keywords: event-based pose estimation, rendezvous, domain gap, sensor fusion, close proximity, harsh lighting1. Introduction Spacecraft pose estimation is the problem of determining the 6-degrees-of-freedom (6DoF) pose consisting of the position and orientation of a space-borne object, typically a satellite. It is a critical step in a wide range of space applications, including rendezvous, close proximity operations, debris removal, refueling and on-orbit servicing [1, 2, 3, 4]. Robust pose estimation is paramount to safely and effectively executing these tasks [5, 6]. Several types of sensor technologies can be employed for spacecraft pose estimation, but they are all subject to size-weight-power and cost (SWaP-C) constraints. Optical sensors such as RGB imaging sensors are favored due to their low SWaP-C requirements, high resolution and the availability of established vision-based algorithms. However, operating in the space environment can present nontrivial challenges to vision-based systems.

artificial intelligence, machine learning, pose estimation, (20 more...)

doi: 10.1016/j.ast.2025.111039

2507.05698

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.45)

Industry: Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Guerra-Manzanares, Alejandro, Shamout, Farah E.

MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning

arXiv.org Artificial IntelligenceOct-21-2025

The aim of multimodal neural networks is to combine diverse data sources, referred to as modalities, to achieve enhanced performance compared to relying on a single modality. However, training of multimodal networks is typically hindered by modality overfitting, where the network relies excessively on one of the available modalities. This often yields sub-optimal performance, hindering the potential of multimodal learning and resulting in marginal improvements relative to unimodal models. In this work, we present the Modality-Informed Learning ratE Scheduler (MILES) for training multimodal joint fusion models in a balanced manner. MILES leverages the differences in modality-wise conditional utilization rates during training to effectively balance multimodal learning. The learning rate is dynamically adjusted during training to balance the speed of learning from each modality by the multimodal model, aiming for enhanced performance in both multimodal and unimodal predictions. We extensively evaluate MILES on four multimodal joint fusion tasks and compare its performance to seven state-of-the-art baselines. Our results show that MILES outperforms all baselines across all tasks and fusion methods considered in our study, effectively balancing modality usage during training. This results in improved multimodal performance and stronger modality encoders, which can be leveraged when dealing with unimodal samples or absent modalities. Overall, our work highlights the impact of balancing multimodal learning on improving model performance.

artificial intelligence, machine learning, modality, (16 more...)

2510.17394

Country:

Asia > Middle East > UAE (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

arXiv.org Artificial IntelligenceOct-14-2025

Collaborative Text-to-Image Generation via Multi-Agent Reinforcement Learning and Semantic Fusion

Shi, Jiabao, Qi, Minfeng, Zhang, Lefeng, Wang, Di, Zhao, Yingjie, Li, Ziying, Xing, Yalong, Li, Ningran

Multimodal text-to-image generation remains constrained by the difficulty of maintaining semantic alignment and professional-level detail across diverse visual domains. We propose a multi-agent reinforcement learning framework that coordinates domain-specialized agents (e.g., focused on architecture, portraiture, and landscape imagery) within two coupled subsystems: a text enhancement module and an image generation module, each augmented with multimodal integration components. Agents are trained using Proximal Policy Optimization (PPO) under a composite reward function that balances semantic similarity, linguistic visual quality, and content diversity. Cross-modal alignment is enforced through contrastive learning, bidirectional attention, and iterative feedback between text and image. Across six experimental settings, our system significantly enriches generated content (word count increased by 1614%) while reducing ROUGE-1 scores by 69.7%. Among fusion methods, Transformer-based strategies achieve the highest composite score (0.521), despite occasional stability issues. Multimodal ensembles yield moderate consistency (ranging from 0.444 to 0.481), reflecting the persistent challenges of cross-modal semantic grounding. These findings underscore the promise of collaborative, specialization-driven architectures for advancing reliable multimodal generative systems.

artificial intelligence, deep learning, machine learning, (14 more...)

2510.10633

Country: Asia > China (0.93)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 05:54:23 GMT

71b17f00017da0d73823ccf7fbce2d4f-Paper-Conference.pdf

dataset, learning, modality, (13 more...)

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Communications > Social Media (0.68)
(4 more...)

Taewan Kim, Joydeep Ghosh

On Single Source Robustness in Deep Fusion Models

Neural Information Processing SystemsOct-3-2025, 03:11:45 GMT

One natural way of improving a model's performance is to make use of multiple input sources relevant to a given task so that enough information can be extracted to build strong features.

artificial intelligence, information fusion, machine learning, (16 more...)

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)